Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dependency validator's performance #417

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xronos-i-am
Copy link

@xronos-i-am xronos-i-am commented Nov 7, 2024

What are you trying to accomplish?

Packwerk::Graph has its own implementation of the topological sorting algorithm. Ruby has stdlib module TSort that does the same thing (by the way, rails use it). I replace own implementation with TSort

What approach did you choose and why?

TSort works much faster (benchmarks attached) with almost same output results. And it does not require attention for support own implementation.

What should reviewers focus on?

The cycles shown in the test #cycles returns overlapping cycles in a graph differ from the original results. But the meaning remains the same (all cyclic dependencies are listed). Only the output format has changed (in some cases). I suppose that can be sacrificed in favour of performance

Type of Change

Basic functionality is intact, but there is a slight difference in the output of the results of dependency validator

Checklist

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • It is safe to rollback this change.

Benchmarks

# frozen_string_literal: true

require "bundler/inline"
require "tsort"

gemfile do
  source "https://rubygems.org"

  gem "benchmark-ips", require: "benchmark/ips"
  gem "packwerk"
end

def arr_to_hsh(arr)
  arr.group_by(&:first).transform_values { |arr_list| arr_list.map(&:last) }
end

module Packwerk
  class GraphWithTsort
    include TSort

    extend T::Sig
    sig do
      params(
        # The edges of the graph; represented as an Hash of Arrays.
        edges: T::Hash[T.any(String, Integer, NilClass), T::Array[T.any(String, Integer, NilClass)]]
      ).void
    end
    def initialize(edges)
      @edges = edges
    end

    def cycles
      @cycles ||= strongly_connected_components.reject { _1.size == 1 }
    end

    def acyclic?
      cycles.empty?
    end

    private def tsort_each_node(&block)
      @edges.each_key(&block)
    end

    EMPTY_ARRAY = [].freeze
    private_constant :EMPTY_ARRAY

    private def tsort_each_child(node, &block)
      (@edges[node] || EMPTY_ARRAY).each(&block)
    end
  end

  private_constant :GraphWithTsort

  arr = [[1, 2], [1, 3], [2, 4], [3, 4]]
  hsh = arr_to_hsh(arr)

  Benchmark.ips do |x|
    x.report("[original] test acyclic graph") do
      Graph.new(arr).acyclic?
    end

    x.report("[tsort] test acyclic graph") do
      GraphWithTsort.new(hsh).acyclic?
    end

    x.compare!
  end

  # Warming up --------------------------------------
  # [original] test acyclic graph     3.167k i/100ms
  #    [tsort] test acyclic graph    21.181k i/100ms
  #
  # Calculating -------------------------------------
  # [original] test acyclic graph    30.041k (± 1.2%) i/s   (33.29 μs/i) -    152.016k in   5.060974s
  #    [tsort] test acyclic graph   197.479k (± 1.3%) i/s    (5.06 μs/i) -    995.507k in   5.041948s
  #
  # Comparison:
  #    [tsort] test acyclic graph:   197479.2 i/s
  # [original] test acyclic graph:    30041.5 i/s - 6.57x  slower

  arr = [[1, 2], [2, 3], [3, 1]]
  hsh = arr_to_hsh(arr)

  Benchmark.ips do |x|
    x.report("[original] test cyclic graph") do
      Graph.new(arr).acyclic?
    end

    x.report("[tsort] test cyclic graph") do
      GraphWithTsort.new(hsh).acyclic?
    end

    x.compare!
  end

  # Warming up --------------------------------------
  # [original] test cyclic graph
  #            3.076k i/100ms
  #    [tsort] test cyclic graph
  #            30.150k i/100ms
  # Calculating -------------------------------------
  # [original] test cyclic graph
  #            29.158k (± 4.3%) i/s   (34.30 μs/i) -    147.648k in   5.074331s
  #    [tsort] test cyclic graph
  #            251.707k (±12.1%) i/s   (3.97 μs/i) -      1.266M in   5.103329s
  #
  # Comparison:
  #    [tsort] test cyclic graph:   251707.4 i/s
  # [original] test cyclic graph:    29157.8 i/s - 8.63x  slower

  arr = [
    [1, 2], [2, 3], [3, 1],
    [4, 5], [4, 6], [5, 7], [6, 7],
    [8, 9], [9, 8], [8, 10], [10, 11], [8, 11],
  ]
  hsh = arr_to_hsh(arr)

  Benchmark.ips do |x|
    x.report("[original] test cycles in a graph with disjoint subgraphs") do
      Graph.new(arr).cycles
    end

    x.report("[tsort] test cycles in a graph with disjoint subgraphs") do
      GraphWithTsort.new(hsh).cycles
    end

    x.compare!
  end

  # Warming up --------------------------------------
  # [original] test cycles in a graph with disjoint subgraphs
  #            756.000 i/100ms
  #    [tsort] test cycles in a graph with disjoint subgraphs
  #            7.294k i/100ms
  # Calculating -------------------------------------
  # [original] test cycles in a graph with disjoint subgraphs
  #            7.031k (±11.0%) i/s  (142.24 μs/i) -   34.776k in   5.010366s
  #    [tsort] test cycles in a graph with disjoint subgraphs
  #            85.352k (±10.6%) i/s  (11.72 μs/i) -  423.052k in   5.016072s
  #
  # Comparison:
  #    [tsort] test cycles in a graph with disjoint subgraphs:    85351.8 i/s
  # [original] test cycles in a graph with disjoint subgraphs:     7030.5 i/s - 12.14x  slower
end

@xronos-i-am xronos-i-am requested a review from a team as a code owner November 7, 2024 06:48
@xronos-i-am
Copy link
Author

I have signed the CLA!

@xronos-i-am
Copy link
Author

@gmcgibbon can you check, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant